FIN7030
2025-01-21
Barry Quinn in Home Attic Office
:fat[Please read my communication policy to get feedback]
From the Stanford Institute for Human-Centered Artificial Intelligence (HAI)
:saltinline[Key takeaways]
:small[Click here for full report]
Historically, the positive fallout from technology adoption waves, empowers some humans, typically to the detriment of others
This time could be different, as AI is the first tool in history:
Such properties have led to the development of the areas of AI Safety
For more details listen to Yuval Noah Harari on AI Safety
AIMA provides four definitions based on Thought vs. Action
Summary: Leveraging computational models to simulate human cognitive processes.
Examples:
Neural Networks: Used for credit scoring by analyzing a large dataset of customer information to predict creditworthiness
Cognitive Architectures: Designing intelligent systems to simulate traders’ decision-making processes in stock markets
Summary: Constructing AI systems capable of logical inference to symbolize knowledge and solve complex problems.
Examples:
Classical AI: Developing rule-based systems for regulatory compliance and monitoring
Expert Systems: Creating platforms that offer financial advice based on a vast knowledge base and rules set by financial experts
Summary: Crafting rational agents optimized to take the most beneficial actions based on their perceived understanding of the world.
Examples:
Decision-Making Algorithms: Algorithms facilitating high-frequency trading by making rapid decisions based on market conditions
Planning: Utilizing AI in strategic financial planning and asset management to maximize returns
Summary: Crafting rational agents optimized to take the most beneficial actions based on their perceived understanding of the world.
Examples:
Decision-Making Algorithms: Algorithms facilitating high-frequency trading by making rapid decisions based on market conditions
Planning: Utilizing AI in strategic financial planning and asset management to maximize returns
Learning Techniques: Applying reinforcement learning in algorithmic trading to learn and adapt trading strategies continuously based on market dynamics
CRISP-DM stands for Cross Industry Standard Process for Data Mining
Microsoft AI life cycle
This is a more “agile and iterative data science methodology”
Microsoft AI life cycle FinTech Model proposed by Haakman et al. (2021)
Context is king in statistics
Financial datasets used to solve modern investment problems offer unique challenges which are beyond many plug and play data science algorithms
The era of “Big Data” has provided a backdrop for the rapid expansion of immense computer-based processing algorithms, for instance, random forest for prediction
The importance of inferential arguments in support of the ML applications has emerged as an exciting (yet underdeveloped) field
Context is king in statistics
This is particularly true for financial research questions where the complexity of the data story (or more formally the data generating process which underpins the sample) result in notoriously noise covariance matrices
A small percentage of information these matrices contain is signal, which is systemically suppressed by arbitrage forces
This course will introduce best practice techniques in financial data science which can help illicit economically meaningful signal and answer contemporary financial research questions
40% Critical Generative AI project on Trading
60% End of term computer based practical test
| Topic | Week |
|---|---|
| Why to study Financial AI? | 1 |
| High-performance cloud computing in finance | 2 |
| Denoising and detoning | 3 |
| Distance metrics | 4 |
| Optimal clustering | 5 |
| Explainable Artificial Intelligence | 6 & 7 |
| Testing set overfitting | 8 & 9 |
| Round up | 10 |
Hedge-fund firm Two Sigma has built a computing system with more than 100 teraflops of power, which is 100 trillion calculations per second
| Property | Example | Algorithmic Goal |
|---|---|---|
| Unstructured, non-numerical and or non-categorical | News articles, Voice recordings, Satellite images | Sentiment Extraction, Commodity supply shocks |
| High-dimensional | credit card transactions | Fraud |
| Sparse containing NaNs (not-a-number) | mixed frequency | Economic nowcasting |
| Implicitly contains information about networks of agents in system | Trade order book data | Black Swan event detection |
Classical econometrics fails on these Big datasets
Linear algebra methods (eg.OLS) can fail in high dimensional data where there can be more variables than observations
Geometric objects, like covariance matrices, fail to recognise the topological relationships that characterise networks
FML offer numerical power and functional flexibility needed to identify complex patterns in high-dimensional space
High-dimensional datasets with many features (predictors) may have complex patterns
For \(p\) features there may be up to \(2^p-p-1\) interaction effects
Unlike ML algorithms, econometric models do not learn the structure of the data
In classical linear regression the model specification may easily miss some interactions, whereas, an ML algorithm, such as a decision tree, will recursively partition a dataset into subsets with simple patterns, which can be fit independently with simple linear specifications
Finance problems that are fundamental about prediction are easily reimagined in the FML paradigm
Measurement of an asset’s risk premium is fundamentally a prediction problem where the risk premium is the conditional expectation of a future realised excess return
Gu, Shihao, Bryan Kelly, and Dacheng Xiu. 2020. “Empirical Asset Pricing via Machine Learning.” The Review of Financial Studies, Working Paper Series, February. https://doi.org/10.1093/rfs/hhaa009
Methods that can reliably attribute excess returns to tradable anomalies are highly prized
Robo-Advisors
Fraud detection
Cryptocurrencies
AI Trading Strategies
🤔
Very broadly speaking, algorithms are what statisticians do while inference says why they do them. A particularly energetic brand of the statistical enterprise has flourished in the new century, data science, emphasizing algorithmic thinking rather than its inferential justification.
—- Efron and Hastie, 2016
—- Lopez de Prado, 2019
🤔
If a machine can think, it might think more intelligently than we do, and then where should we be? Even if we could keep the machines in a subservient position … we should, as a species, feel greatly humbled.
—- Alan Turing, 1951
The first ultraintelligent machine is the last invention that man need ever make, provided that the machine is docile enough to tell us how to keep it under control.
—- Irving J. Good, 1965
| Paradigm | Goal | Examples |
|---|---|---|
| Supervised Learning | Using labelled data the goal is to learn the relationship between \(X\) and \(Y\) | Random Forests, Extreme Boosted Trees, Recurrent Neural Networks |
| Unsupervised Learning | Given a set of unlabelled data the goal is to retrieve exploratory information about, groupings or hidden patterns | Hierarchical clustering, \(k\)-mean clustering, hidden Markov models, Gaussian mixtures |
| Reinforcement learning | An algorithmic approach to Bellman optimality of a Markov Decision Process | A form of dynamic programming used for decisions leading to optimal trade execution, portfolio allocation, and liquidation over a given horizon |
Examples: Decisions tree, Neural network
Example: Restricted Boltzmann machine (RBM)
Supervised machine learning is often an algorithmic form of statistical model estimation in which the data generation process is treated as an unknown (Breiman 2001)
Model selection and inference is automated, with an emphasis on processing large amounts of data to develop robust models
It can be viewed as a highly efficient data compression technique designed to provide predictors in complex settings where relations between input and output variables are non-linear and input space is often high-dimensional
Machine learners balance filtering data with the goal of making accurate and robust decisions, often discrete and as a categorical function of input data
This fundamentally differs from maximum likelihood estimators used in standard statistical models, which assume that the data was generated by the model and typically have difficulty with over-fitting, especially when applied to high-dimensional datasets
Given the complexity of modern datasets, whether they are limit order books or high-dimensional financial time series, it is increasingly questionable whether we can posit inference on the basis of a known data generation process
It is a reasonable assertion, even if an economic interpretation of the data generation process can be given, that the exact form cannot be known all the time
Examples: OLS regression, neural networks, hidden Markov models, etc
Examples: kernel methods, support vector machines, Gaussian processes
Backtesting is not a good research tool
Easley, David, Marcos M. López de Prado, and Maureen O’Hara. 2012. “Flow Toxicity and Liquidity in a High-Frequency World.” The Review of Financial Studies 25 (5): 1457–93.
A successful theory will be predicted out-of-sample. Furthermore, it will explain not only positives (x cause y) but also negatives (the absence of y is due to the absences of x)
In a theory discovery process, ML plays the key role of decoupling the search for variables from the search for specification.
Classical statistical methods do not allow this decoupling of the two searches.
ML models are wrongly characterised as “oracles”
An oracle is a black box that is able to produce a solution for any instance of a given computational problem Complexity theory definition
Recent scientific discoveries have reveal radically different uses of ML
Existence: ML has been deployed to evaluate the plausibility of a theory across many scientific fields
Importance: ML algorithms can determine the relative informational content of explanatory variables for explaining or predicting purposes
Causation: ML algorithms are often used to evaluate causal inference (casual random forest; Athey,2015)
A standard approach in industry is to use historical data to backtest an investment strategy identified from the training set
Researchers who run multiple statistical tests on the same data set are more likely to make a false discovery
This selection bias comes from fitting the model to perform well on the test set, not the train set
Test set overfitting occurs when a researcher backtests a strategy until the output achieves a desired performance
The poor performance of a backtest should be a sign to fix the research process, not the investment strategy
Use the familywise error rate (FWER) or the Deflated Sharpe ratio
Use combinatorial purged cross-validation methods (CPCV), which generate many test sets using resampling combinatorial splits of train and test sets
Use historical series to estimate the underlying data-generating process, and use Monte Carlo methods to create fake/synthetic samples that match the statistical properties observed in history
Backtests cannot simulate Black swans- only theories have the breadth and depth needed to consider the never-before-seen occurrences
A backtest may insinuate that a strategy is profitable, but they do not tell us why
Only a theory can state the cause-effect mechanism, and formulate a wide range of predictions and implications that can be independently tested for facts and counterfacts :::
Dixon et al., 2020, Machine Learning in Finance
AI and Trading